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ABSTRACT 

The results of classifying test items on the basis of 
their Mantel-Haenszel (MH) alpha estimates were compared to the 
results of classifying these items using an item response theory 
(IRT) based procedure involving the comparison of item difficulties 
in the interest of identifying the alpha value that maximized the 
decision concordance between the two methods. The data consisted of 
candidates' responses to 299 scored items on an examination for 
professional licensure. A total of 68,458 candidates took this 
examination m 1988. The candi.dates' ethnicity was determined by 
th^ir self -classifications into one of seven categories. A total of 
47,573 caTididates classified themselves in Ethnic Group 1; 6,486 in 
Ethnic Group 2; 5,466 in Ethnic Group 3; 2,004 in Ethnic Group 4; 
1,014 in Ethnic Group 5; 486 in Ethnic Group 6; 307 in Ethnic Group 
7; 726 as "other"; and the remaining 4,396 candidates did not specify 
their ethnicity. All candidates except the 4,396 who did not classify 
themselves wer3 used for the KH analyses; for the IRT analyses, 
random samples of candidates were drawn '-"om the four largest ethnic 
groups, while all candidates from the smaller groups were used. 
Candidates were divided into score groups. In both analyses, the 
majority ethnic group became the reference group and the other ethnic 
groups were designated focal. The MH estimate of alpha indicated that 
no differential item functioning was detected. Statistics resulting 
from IRT methods correlated highly wiih MH alpha values. U -''ng the MH 
method in conjunction with IRT methods resulted in cutscore^ with a 
high level of decision concordance with advantages over traditional 
methods of using a significance level to establish an alpha 
criterion. Eight tables and eight figures provide study data. 
(SLD) 
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. K4», m test items has concerned »easurement 
The problem of assessing bias in test items 

-nd they have proposed a variety of .ethods for 
Moecialists for many years, and they nave p v 

ue. t.. ^.cuon ™u.n. .... -no.. .o.s. 

. „ . flORQl and Bills (1939) have provided some of the 
Berk (1982), Cole end Moss (1989), eno 

„ost recent revie« of these «thods. 

the methods ..eiUhle, the Kantel-Heens.el (HH, method has 
.ecentl, received a sreat deal o. Interest as a practical means of assessing 

end Thaler (198e, this method involves compotln. for the ma.orlt, .r.p an 
mmorlt, .ro..p (called the reference sronp and focal .roup, respectively, t. 
edds ratio (called alpha) of their success on an Item across score .roups .n 
.Mch the .roup memhers- ahlllt, levels are held constant. If an It. .s . 
f^ctlonln, differentially, the reference .roup and focal sronp 

1 1 0 To the degree that the performance of the two 
(a) vill be equal to 1.0. To tne aeg 

r 1 n n-id the item can be said to 
groups differs, alpha vill deviate from 1.0. .ad 

exhibit differential item functioning (DIF). 

^ i^ortant question for users of the HH method who ^st analye. large 
.^mples is .hat constitutes a meaningful amo.t of ... The significance 
.vel associated with a chi-square test of the null hypothesis 

he used to distinguish statistically si.if icant levels of alpha^e 
Holland . Thayer. 1.S6). but thi's significance test is sensitive to 
1. «i.e .s sign?- -ance tests generally are. More 
•ffccts of sample size, as sign. 

V e«™t>les are used, the null hypothesis can be 
specifically, if large enough samples are us . 

i 41-Pjn oerfonaance of tne majority 

rejected even when the differences in the item performan 
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The rationale for this strategy is that the flaws of 
one method are often the strength of another, and, by 
combining methods, observers can achieve the best of 
each while overcoming their xrnique deficiencies, 
(p. 302) 

More specifically, in this study the results of classifying items on the 
basis of their MK alpha estimates were compared to the results of classifying 
these items using an IRT-based procedure involving the comparison of item 
difficulties (b-values) in the interest of identifying that alpha value 
that maximized the decision concordance between the two methrds. 

Method 

Description of Sample and Test Data 

The data analyzed consisted of candidate responses to 299 scored items 
on an examination for professional licensure prepared by CTB/McGraw-Hill. A 
total of 68,458 candidates took this examination in 1988. The candidates' 
ethnicity was determined by their responses to a demographic question in which 
they were asked to classify themselves in terms of one of seven ethnic 
categories. A total of 47,573 candidates classified themselves in Ethnic 
Group 1; 6,486 candidates classified themselves in Ethnic Group 2; 5,466 in 
Ethnic Group 3; 2,004 in Ethnic Group 4; 1,014 in Ethnic Group 5; 486 in 
Ethnic Group 6; 307 in Ethnic Group 7. An additional 726 candidates 
classified themselves as "Other", and the remaining 4,396 candidates did not 

specify their ethnicity. 

For the Mantel-Haenszel analyses, all candidates who classified 
themselves were used. For the IRT analyses to be described, a random sample 
of 500 candidates was drawn from Ethnic Group 1, the majority group, and 
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r«ndoB samples of 1000 candidates vere drawn from Ethnic Groups 2, 3, and 4, 
the three largest minority groups. All candidates who classified themselves 
In Ethnic Groups 5, 6, and 7 or as Other were used in the analyses. 
Procedure 

Both the Mantel-Haenszel and IRT-based analysis to be described entail 
comparisons of the item performance of two groups, a reference group end a 
focal group. In this study, the candidates in Ethnic Group 1, the majority 
group, vere used as the reference group, and the seven other candidate groups 
each were designated a focal group. 

Mantel-Haenszel Analysis . To use the Mantel-Haenszel (MH) procedure, the 
reference group and focal groups are matched on ability. To match the groups, 
the total test score typically is u&ed (Holland & Thayer, 1986) to sort 
examinees into score groups. Candidates in a given score group are then 
classified in terms of whether they answer each item correctly or incorrectly. 
In Figure 1, this type of classification of responses to an item is shown for 



score group j. 



Insert Figure 1 about here 



In Figure 1, Tj refers to the total number of candidates in score group j; Aj, 
Bj. Cj . and Djto the number of candidates in each of the four cells; and 
Tij , Toj. T Rj, and T yj to the marginals. 

The MH estimate of alpha (a), which expresses the common odds ratio of 
success of the two groups across all score groups, can> defined as 



„ . . (1) 

iBjCj/Tj 

Alpha can vary between 0 and - . As noted i^reviously, when o - 1 the odds for 
success are the same in the reference group and focal group. That is, the 
reference group and the focal group demonstrate the same performance on an 
item. This finding indicates that no differential item functioning (DIF) or 
potential bias has been detected. 

In this study, candidates' total raw scores on the examination were used 
to divide the candidates into score groups. A total of 13 score groups were 
constructed for the analyses comparing the majority group. Ethnic Group 1, 
with the four largest minority groups. Ethnic Groups 2 through 5. At least 50 
candidates fell in each of these 13 score groups, which had the following raw 
score ranges: 135-1A3, UA-152, 153-156, 157-161, 162-168, 169-171, 172-175, 
176-178, 179-182, 183-187, I887I9A, 195-202, and 203-221. Nine score groups 
were constructed for the analyses compar.rng Ethnic Group 1 with the three 
smallest candidate groups. Ethnic Groups 6, 7, and Other. Fewer score groups 
were used for these analyses in order to ensure that there were enough 
candidates in each score group for adequate etching to occur. At least 22 
candidates fell in each of these nine score groups, which had the following 
ranges: 157-165, 166-173, 17A-181, 182-186, 187-193, 19A-199, 200-203, 20A- 
211, and ?,12-221. Candidates scoring belou and above the listed raw score 
ranges were excluded from the analysis because their disparate scores and low 
counts would not permit adequate matching. 
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The IRT-B "*""^ Procedure 

The IRT-based «ethod used in the study was derived froa Lord (1980. pp. 
219-220). «d it entailed four steps. First, the Rasch ite. difficulty (b- 
p«r-»eter) for each itea vas estimated using LOGIST 5 (Winger.ky. Barton. & 
Lord. 1982) and the sample of 500 candidates randomly drawn from the majority 
reference group. Rasch ability estimates for these candidates generated by 
UKSIST were standardized, that is. scaled to have a mean of 0.0 and a standard 
deviation of 1.0. Then item difficulties were re-estimated seven times, using 

* ^>,o ceven focal Rroups for each estimation. The seven 
the responses of one of the seven locai group 

sets of item difficulties that were estimated for the seven focal groups 
subsequently were rescaled to place these sets on the same scale as that of 
the item difficulties estimated for the reference group. Finally, to make 
each of the seven reference group-focal group comparisons of item performance, 
a t-statistic for each item was calculated to assess the difference between 
the difficulty of the item for the reference group and for the focal group. 
The t-statistic is expressed as 

'bRi " '^Fi 



J(var 'bRi+ var Iji) 
»her. 'bRi and 'bn are the estimated it«. difficulty of lte« i for the 
„fere„ce group «.d the focal group, respectively, and var i.^ is expressed as 

1 
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Procedure f»r lafentifving Cutscore s 

For each reference group-focal group comparison, the MH estimates of 
«.lph. obtained for all 299 items in the examination were plotted against the 
1:..t.tlstlc» obtained for these items in a bivariate plot. Subsequently the 
bivarUte plot was partitioned by selecting a MH cutscore and • t-statistic 
cvtscore, appearing on the x and y axis, respectively, and drawing 
perpendicular lines from the axes at these cutscores through the plot; the 
intersection of these lines created four quadrants. Quadrant 1 contained 
itc^ With alphas and t-statistics that were greater than or equal to the 
selected alpha and t-statistic cutscores. respectively, ll.ese items will be 
referred to as "potentially biased" in this paper. Quadrant 3 contained items 
with both alphas «.d t-statistics less than the selected alpha and t-statistic 
cutscores. respectively. Thus Quadrants 1 and 3 contained items that were 
consistently classified on the basis of the t«o cutscores. In contrast. 
Quadrant 2 contained items with alphas greater than or equal to the selected 
alpha cutscore and t-statistics less than the selected t-statistic cutscore; 
Quadrant 4 contained items with alphas less than the selected alpha cutscore 
and t-statistics greater than or equal to the selected t-statistic cutscore. 
Thus. Quadrants 2 and A contained item, that were inconsistently classified cn 
the basis of the two cutscores. 

As one measure of concordance between the results of the MH «.d W:- 
hased procedures, counts were oade of the number of lt«.s falling In Quadrants 
X „d 3 combined. By .electing different «H values and t-statistic values and 
counting the it.^ in the two quadr«,ts. each plot was searched .y.te^tic.11, 
„lng . computer algorlthn to find the combination MB value «.d t-.tatistic 
„lue that produced the largest nun*er of items falling In the two quadrants 
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con*in.d. On the .lpl» .ole. the cutscores tried were l.«0. 1.42, 
through 2.90 In .teps of 0.01. Each of these cutscores «s tried in 
coi*l„.tion «lth t-stetistic cutscores of 1.96. 1.97. 1.98 through 7.561 t„ 
.t.p. of 0.01. «d the sum of the ite..s felling in Quadrant 1 plus Quadrant 3 
for «ch pair of cutscores -as calculated. The ratio of this to the total 
„«b.r of iteos analysed «.s called the concordance proportion. 

constraints »ere imposed on the algorithm used to identify the pair of 
cutscores that produced the maximum count of items falling in Quadrant 1 plus 
Quadrant 3. These constraints .ere necessary because a concordance proportion 
of 1.00 could be obtained simply by making the alpha and t-statistic cutscores 
so extreme that .11 items ».aly.ed fell in Quadrants 1 or 3. The first set of 
constraints specified that when a new pair of cutscores was selected at least 
three items should he found in Quadrant 1^. at least two items should be found 
in Quadrant 2. «.d at least two items in Qu«irant «. This set of modest 
constraints, .hich required that about IX of the items appear in Quadrant 1 
.nd slightly less than U appear in Quadrant 2 and Quadrant 4. served to 
prevent the selection of extreme cutscores. The second constraint specified 
that «hen a new pair of cutscores produced a higher concordance proportion 
than a previous pair, the new pair could be considered a new maxima only when 
this new pair produced a proportional loss of it«,. in Quadrant 1 that was no 



1 These statistics correspond to probabilities that range from p -.006 
to p <. 000001, one-tailed. 

2 A baseline comparison conducted .'-f^J^^^^^^ 

majority group to two j:c»parison groups^ i^^^^^ ^^ased by chance alone using 

level five items ^^T i4od A baseline comparison using 

the t-statistic employed ^he IRT^thod. ^ ^^^^^ 
the MH chi- luare test indicated ttiat u itenu. 



chance alone 
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greater than the proportional loss in erroneously classified items in Quadranc 
2 plus Quadrant 4. In effect, this aecond constraint insured that the increase 
In the number of items in Quadrant 3 had to be due to reduction* in erroneous 
cUsslfications (in Quadrant 2 plus Quadrant U) that vere at least as great as 
«ny reductions in the number of Quadrant 1 items. In the event that more than 
one pair of cutscores produced the maximum concordance proportion, the pair of 
cutscores consisting of the smallest alpha and t-statistic was selected as the 
maximizing cutscore., 

Results 

In Table 1 are shown the percentage of candidates in each of the 13 raw 
score groups constructed to conduct the MH analyses on the five largest ethnic 
groups. In Table 2 are shown the percentage of candidates in each of the nine 
raw score groups constructed to conduct the MH analyse? on the three smallest 
ethnic groups. Also noted in these two tables are the percentage of 
candidates in each ethnic group who had extreme raw scores and were excluded 

from the MH analyses. 

The data in Tables 1 and 2 show that candidates in Ethnic Groups 2, 5, 
and 6 were fairly evenly distributed over the score groups included in the 
analyses. The scores for candidates in Ethnic Groups 1. 3. 4. and 7 were more 
heavily concentrated in the higher raw score groups. The scores for 
candidates in the Other group were somewhat more heavily concentrated in the 

lower raw score groups. • 

Table 3 provides summary statistics describing the alpha and t-statistic 
values computed for each of the seven reference-focal group comparisons that 
were analyzed. With respect to the alpha values, the table shows that the 
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„f.«n=e-£.=.l poup =<«.p.rlsons having low .lph.s .Iso h.d 1« 
,.rl.MUty in their .Iph. values. 7n contrast reference-focal group 
eo^rUon. having relatival, hlg,»r «an alpha, also had higher varlahlllt, 
their alpha values. Speclflcall,. lo. -ean and «dlan alpha values het.een 
,g and 1.10 vera ohtained for the ccparlsons Involving Etta,lc Group 1 vs. 
Bt^lc croups 3. 6. 7. and Other. Kelatlvel, lo» varlahlllt. In the r«,ge 
of alpha values aUo «re found In these ccparlsons. suggesting lov levels of 
MF as ^asured h, the MB alpha »ere evident In these comparisons. Kotabl, 
higher «an alphas «.d greater variability «ere found for the comparisons 

for these tuo comparisons were very close to 1.00 and louer t^an the means, 
indicating positive s.e.s In distribution of alpha values, suggesting the 
presence of some Items «lth higher levels of DIF In these two comparisons. 

With respect to the t-statlstlcs obtained for the .even reference-focal 
g.o..p co^arlsons. a similar pattern of findings .as observed. Specifically. 

-nd - U were obtained for the comparisons 
mean t-statistics between -.03 and .lA were o 

involving Ethnic Oroup 1 vs. Ethn.c Groups 3, 6. and 7. and the variability 
m the t-statlstlcs oS.alned for these comparisons .as also relatively lo.. 

J v„ t-statlstlc appear to be evident In 
Thus. lo. levels of DIF as measured by the t statistic 

these four comparisons, as .as the case .hen their levels of alpha .ere 

analyzed. Notably more negative ™=an t-statlstlcs and greater variability 

were observed for the comparisons Involving Ethnic Group 1 vs. Ethnic Groups 

i 5. and Other: In .11 comparisons except Ethnic Group 1 vs. Ethnic Croups 

and 6. the «dlan t-statlstlcs .ere aome.hat higher than the ^s. This 

i /URtribution of theae statistics 

•uggests .omwhat negative skews in the distributio 

««rticularly for the comparisons of Ethnic Group 
obtained for each comparison, particularly lor 
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1 vs. Ethnic Groups 2 and 5. For these two comparisons, it appears again that 
there are some items with higher levels of DIF as measured by the t- 
statistics. 

In Table 4 are the correlations between the alpha values and t- 
•Utistics calculated for each of the seven comparisons. These correlations 
were consistently high, ranging from .75 to .8/.. suggesting a consistently 
strong and positive relationship between the values produced by the MH and IRT 
methods. 

Bivariate plots of the alpha and t-statistics calculated for the 299 
items and seven reference-focal group comparisons are provided in Figures 2 
through 8. It should be noted that because of constraints inherent in the 
plotter, the point placement must be regarded as approximate. In general 
these plots show a somewhat curvilinear relationship between the alpha and t- 
statistics calculated for each comparison, which is the expected relationship 
between two variates, one of which is in an antilog relationship to the other. 
It should be noted that there was evidence of a more pronounced curvilinear 
relation between the two statistics in Figures 2 and 5. which involved the 
comparisons of Ethnic Group 1 vs. Ethnic Groups 2 and 5. 

Provided in Table 5 for each of the seven comparisons are the cutsco.es 
that maximized the proportion of concordant ratings resulting from use of the 
t«o methods. These cutscores hav-. also been drawn on the plots in Figures 2 
through 8. The maximizing alpha cutscores varied somewhat across the seven 
comparisons, ranging between 1.56 to 1.99. The maximizing t-statistic 
cutscores varied more substantially, ranging from 2.52 to 5.92. A comparison 
of the maximizing cutscores obtained for each reference-focal group comparison 
with the alpha and t-statistics reported in Table 3 shows a pattern: Lower 
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cutscores were derived for those comparisons with lower «d less variable 
lev . of DIF. and higher cutscores were derived for those comparisons that 
appeared to have .ome items with «ore substantial amounts of DIF. 

Table 6 shows for each reference-focal group comparison the distribution 
of '.terns in the four plot quadrants at the maximizing cutscores «d measures 
of the degree of concordance between the item classifications resulting from 
use of these cutscores. With respect to the distributions >f items across the 
four quadrants, not. should be made of the numbers of items that appeared in 
the first quadrax^t across the seven comparisons. Table 6 shows that these 
numbers ranged from four items tc 52 items, with the highest numbers of items 
observed for the comparisons involving Ethnic Group 1 vs. Ethnic Groups 2 and 
5. It was these comparisons that both the alpha and t-statistics 
independently suggested had items with higher levels of DIF. 

Two measures of concordance for the seven comparisons of interest are 
reported in Table 6. The first measure, called the concordance proportion 
(Po) in this paper, was first suggested by Hambleton and Novick (197A) as a 
Jasure of the consistency of mastery/non-mastery classifications. In the 
current study it was used to assess the proportion of items that were 
consistently classified by the two methods at the maximizing cutscores. Table 
6 shows that the maximizing cutscores generally produced very high concordance 
rates, which ranged between .9A and .99. 

The second measure. Cohen's (1960) kappa (k) has been reco^nended in the 
literature (Swaminathan. Hambleton. & Algina. 1974) as a useful index of 
decision consistency. It indicates the degree to which the proportion of 
decisions found to be consistent over methods exceeds the proportion to be 
expected by chance, that is. when the two decision methods are statistically 
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^.ep«...«. AS .en note. S...ov.^ O'SO, W «d -le, 
(1,,0), t.e relationship ~ the end . is coeple. becu.e the two 
^tUtics .te .«ecte. .iffetentl, h, (1) the shapes o* the t.o «.ttih„tions 
.f .cores hein. .naly«.. (2) the location of the cutscotes in these two 
«sttihnti,ns. .n. (3, the cottelations .t«een t»o Msttih^tions of scores. 
The results for k reported in Table 6 were somewhat lo»er than those 

The lower value of k for this 
^A^r^ry the k values were hign. me iuw^i. 
Group 4 comparison, tne k vaxu^ 

co^arison appears to he One to the .nusnal .e.ree of .isclassif ication tha 
U apparent .hen the n„.hers of ite^ in plot ,ua.rants 1 and 2 - co^are.. 
M the tahle shows, of the 10 ite»s that exceeded the =a.i.i.inB alpha 
..score, onl, four of these ite«s also exceeded the ^i.i.in. t-statistic 

.he ^i»i.ins t-statistic cutscore and fell in ,uadrant 2. 

Tahle 7 provides a s«-ary of selected information presented in Tahles 
e. m this tahle. the reference-focal sronp co^arisons were first ranged 

the distrihntions of alpha values calculated for the seven ccparisons 
co^arisons -ere also ranged fro. lo» to hi.h in order of the value 
t-tatistic at the third ,uartile in the distributions of t-statistics 

4o 4c evident the rank orders of the 
calculated for the seven comparisons. As is evident 

prisons based on the t.o statistics .ere near, identical, indicat n. that 
^ .tatistics calculated usin. the t»o .thods si^ilarl, distin^ishe^ 

. * «f the decree of potential bias in the 
between the seven coioparisons in tern, of the degree 

Xvzed These bindings corroborate the high correlations reported in 
ite^s analyzed. These Un ..e third quartile values of the 

Table A. The similarity b»-tve.n the ranks of v. e thir q 
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tMO Statistics and the ranks of the number of items in the first quadrant of 
the plot further Indicated that the two methods produced cutscores that 
ordered the comparisons similarly in terms of their potential bias.- That is. 
eo«p.risons Independently identified by the two methods as having low levels 
of potential bias (i.e.. low alphas and t-statistics at the third quartile) 
also were found to have few items in the first quadrant. In contrast, 
comparisons independently identified by the two methods as having relatively 
higher levels of potential bias (i.e.. relatively high alphas and t-statistics 
at the third quartile) were found to have substantially more items in the 
first qtiadrant. 

Finally. Table 8 shows what results would obtain were significance 
levels for each of the two methods used to detect DIP were used in lieu of a 
cutscore derived using the multiple method approach described in this paper. 
Using the MH chi-square test, between 11 and 132 items would be flagged for 
DIF against a minority group at p <.01. If a significance level of p <.01 for 
the t-statistics were used, between 5 and 9A items would be flagged for this 
DIF. In contrast, as shown in Table 6. between 5 ard 52 items were flagged 
for DIF using the multiple method approach. It should also be notod that the 
numbers of items shown to be flagged for DIF in Table 8 under each s^thod were 
strongly related to the sizes of the minority groups that were analyzed in the 
seven comparisons, whereas the numbers of items flagged for DIF using the 
maximising cutscores were not related to sample size. 

Discussion 

The findings suggest that the use of a multiple method approach to 
define an alpha criterion for users of the HH method has adv«^tages over the 
more traditional methods of using a significance level to establish this 
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«„...c«io„ o. «»s .s ,o«„U.XX, .use. t..o.. . — c. o. 

^„ greater .eneralizaMIity of the fi«.lnss, as ..U as greater 
confidence that the classification of ite^ is accurate. 

^e IKT «thod used in conjunction with the «. «thod involved assessing 

. » t-statistic for each iteB that indicated the degree to 
DIF by computing a t-statistxc i 

r .V differed for the reference and focal group, 

vhich the difficulty of the item differed t 

. Should he e^pected of an alternative .thod of assessing ---- - - 

.-statistics resulting fro. use of this «thod correlated highl, -ith the ^« 
.ipha values «>d ordered the seven reference-group comparisons siMUrl, in 

.eres of the magnitude of .1. detected. «hen the IKT method -as used in 

^crores we-e identified that produced a 

conjunction with the MH analyses, cutscores 

high level of decision concordance. 

Xt is evident that the .a.i.i.ing alpha cutscores may vary over the 

In the current study, these cutscores 
reference-focal group comparisons. In the cu 

„f rthnic Group 1 vs. Ethnic Group 7 to 
ranged from 1.S6 for the comparison of Ethnic P 

, ,S and 1.99 for the t«o comparisons of Ethnic Croup 1 vs. Ethnic Croups 
S, respectively. It is interesting to note that the for^r comparison 

.east ..e. in its distrihution of alpha values, the latter t»o comparisons 

produced the highest means and standard deviations and greatest positive 

Ls. The t-statistics for these t-o comparisons sho-ed simiUr discrepant 

^ ^>,«t varvinR degrees of DI? *ffect the 
patterns. These findings .uggest that varying 

. w.d in these comparisons, and that the txature of the 
minority g'oups involved in these comp 

^Plationship between the alprxa anu 
detected has a differential impact on the relationship 
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fftmic «d, fcence, on tte »«iBi^ing cutscores. 

one possiU. «pl«.ation ,or the different «a.l.i.inS -..cores i. that 
«««ent fetors ar- producing the differential in potential hUs evident 
.cross co^ar^sons. Por ««.ple. cultural differences ^ active in so^ 
co^arisons, vhereas In others curriculum effects are also present. It is 
„ote«orth, that Ethnic Groups 2 and 5 are .novn to he co^rised of largel, 

foreign-educated candidates. 

„ one alpha cutscore ^t be used .here »altiple co^arisons are to be 
^ee. it aee^ »st reasonable to choose the lowest of the »a,inizin. 
cutscores that are identified for the reference-focal groups being analysed, 
setting the cutscore at the lowest of these values ensures that any 
.isclassification errors accrued b, using a ..non-opti=al" cutscore for a^ of 
the comparisons would not be dis.dv«.tageous to the minority groups in these 
co^^isons. using the ..non-opti^l" cutscore would result in .ore erroneous 

fall to be classified because of this cutscore. 

The .ethodolog, employed in this stud, has been replicated on another 
...ination for professional licensure prepared b, CTB/HcOraw-HiU. -here the 
candidate population includes so. of the sa^ ethnic groups involved in the 
current stud,. For the reference-focal group comparisons that were the same. 
«,e findings fro. this replication were comparable to those in the current 
..ud,. That is. high levels of decision concordance were found, and those 
eo.p.ri.ona with higher levels of DIF as measured by the alpha and t- 
.„U.tics al.o were found to have more items classified in the first plot 

<: in this quadrant also did not appear to be a 

quadrant. The numbers of items in this qua 

4n those comparisons involving 
function of s«nple size, Furthermore, in those mp 
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predominantly foreign-educated candidates in the focal group, .ore extreme 
.iphas «^d t-statistics as veil as n.ore items in the first plot quadrant were 
fotmd« 

The merits of the proposed methodology rests on its generalizability and 
accuracy. It is recommended that the «.ethodology be replicated in the future 
using other examinations and other reference-focal group comparisons. In 
addition, simulation studies should be done to assess the degree to which the 
proposed methodology effectively distinguishes between items that do and do 
not possess "true" DIF. 
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I«ble 1 



Percentage of Candidates in Each of I* l«aw Score Gtoaps 
and ' eluded Raw Score Ranges 
by Ethnic Groiq> 



Raw Score Group 



«.h«<r 135- 144- 151- 157- ii 169- 172- 176- 179- 183- 188- 195 203- 

<134* if/ Itl 156 161 :6 171 175 178 182 18^ 194 202 221 >222* 

1 0.2 0.2 0.5 0.4 0.8 2.1 1 2 2.2 2.0 3.4 5.3 9.2 12.9 32.9 26.8 

2 18.7 6.9 8.8 4.5 6.1 10.1 4.1 5.6 4.5 5.4 6.1 6.9 5.2 5.5 1.4 

3 2.8 2.2 4.0 2.8 3.9 8.9 4.4 6.4 5.2 7.4 8.6 11.7 11.9 15.3 4.4 
A 6.A 2.8 4.1 2.5 4.0 6.4 3.6 4,3 4.4 5-4 7.5 11.2 10.6 19.0 7.7 
5 16.5 7.4 10.0 5.5 5.8 10.?, A.9 6.0 6.0 5.3 5.6 5.8 4.9 5.0 0.9 

* Cases in this score range were excluded from the analyses. 
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Table 2 

Percentage of Candidates in Each of 9 Raw Score Grot^s 
find Excluded Raw Score Ranges 
by Ethnic Groi^ 



Raw Score Groups 



Ethnic 
Group 


157- 
<157* 165 


166- 
173 


174- 
181 


182- 
186 


187- 
193 


194- 
199 


200- 
203 


204- 
211 


212- 
221 


>221* 


1 


1*2 


1*8 


3*3 


5-7 


5.2 


8.8 


9.5 


6.7 


14.3 


16.7 


26.8 


6 


18*3 


9-5 


loa 


10.9 


7.8 


8.6 


4.7 


5.1 


8.0 


5.1 


11.7 


7 


4*6 


7*2 


7*2 


7.8 


8.5 


8.5 


10.7 


7.8 


9.8 


11.7 


16.3 


Other 


34*8 


8.1 


10*0 


10.3 


5.0 


6.1 


5.0 


3.3 


4.4 


5.5 


7.4 



* Cases in this score range were excluded from the analyses. 
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Table 3 



Siaaaary SUtiitiCi* Describing Alpha and T-Statistic 
Values by Reference-Focal Group Comparison 



Jteference-Focal 
Qamxp Coe^ison 



Ethnic Group 1 vs. 
Ethnic Groiip 2 

Ethnic Group 1 vs. 
Ethnic Groijp 3 

Ethnic Group 1 vs. 
Ethnic Group 4 

Ethnic Group 1 vs. 
Ethnic Group 5 

Ethnic Gro\^ 1 vs. 
Ethnic Group 6 

Ethnic Group 1 vs. 
Ethnic Group 7 

Ethnic Groijp 1 vs. 
Other 



Alpha Values 



T-Statistic Values 



Kean S.D. Hedian Q3 Ql Hean S.D. Median (}3 Ql 



1.20 .85 

1,06 .33 

1,04 .26 

1,29 1.08 

1,08 .45 

1,03 .23 

1.10 .49 



.98 1.47 .67 



1.04 1.27 .82 



.23 4.93 .09 



,14 2.80 -.21 



1.01 1.17 .86 -.14 2.31 .10 



1.00 1.65 . 61 



.98 1.27 .78 



1.02 1.14 .88 



1.02 1.33 .77 



,31 5.60 .04 



3.04 -3.49 

J, 90 -2.03 

1.50 -1.79 

3.39 -4.20 



.12 3.00 -.14 1.94 -2.27 
.03 1.31 .08 .80 -.90 
.19 3.55 -.07 2.44 -2.58 
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Table A 



Correlations between Kantel-Kaensiel Alphas '-^ ^ 
T^Statistic Valiies by Reference-Focal Group Coapariton 



Refertnce-Focal ^^xy 
Group Cooparison 



E'lmic Group 1 vs. 
Ethnic Group 2 

Ethnic Group 1 vs. *^ 
Ethnic Group 3 

Ethnic Group 1 vs. •'^^ 
Ethnic Group 4 

Ethnic Group 1 vs. •78 
Ethnic Group 5 

Ethnic Group 1 vs. •80 
Ethnic Group 6 

Ethnic Group 1 vs. •'^^ 
Ethnic Group 7 

Ethnic Group 1 vs. •^^ 
Other 



mie 5 

Cutscores that K&xiAize the Concordant Class if icAtions 
by Reference-Focal Group Coiq>arison 



Concordance 
Concordance Haxisltii^ 
Reference-Focal Maximizing T-sUti«tic 

Grox^ Comparison Alpha Cutscore Cutscore 



Ethnic Group 1 rs* 
Ethnic Groi^ 2 


1.98 


5.92 


Ethnic Groir; 1 vs. 
Ethnic Group 3 


1.73 


4.63 


Ethnic Group 1 vs. 
Ethnic Group 4 


1-59 




Ethnic Group 1 vs. 
Ethnic Grot^ 5 


1-99 


3.75 


Ethnic Group 1 vs* 
Ethnic Group 6 


1.76 


4.02 


Ethnic Group 1 vs. 
Ethnic Group 7 


1.56 


2.52 


Ethnic Group 1 vs. 
Other 


1-93 


4.82 



Table 6 

Count* of lUos in Four Plot Quadrants, Total 
Concordance and Concordance Statistics at Kaxiaiting Oitacores 
by Reference-Focal Groi^ CoB?>arison 



2A 



JJafarence-Focal 
Group Ooaparison 



Plot Quadrant 



Total ConcordazKre 

Condor- Pr- ;j>ortion 

dance (Po) 

(1 + 3) (1 +3)/299 



Kappa 



Ethnic Group 1 vs. 30 10 
£thnic Groi^ 2 

Ethnic Group 1 vs. 7 2 
Ethnic Group 3 

Ethnic Groi^ 1 vs. A 6 
Ethnic Group 4 

Ethnic Group 1 vs. 52 3 
Ethnic GroiQ) 5 

Ethnic GroiQ) 1 vs. 16 A 
Ethnic Group 6 

Ethnic Grot^ 1 vs. 5 2 
Ethnic Group 7 

Ethnic Group 1 vs. 17 3 
Other 



253 
288 
287 



6 
2 
2 



228 16 

27'^ 5 

290 2 

272 7 



283 

295 

291 

280 

290 

295 

289 



.99 
.97 
.94 
.97 
.99 
.97 



.76 
.77 

.49 
.81 
.76 
.71 
.76 



Total Across Groiq)S 131 30 1892 40 



2023 



.97 



.77 
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Table 7 



Refer«nce-FocAl Ortnsp CooparisonB Ranked^ by 
Mantel-Baenstel «nd X-Statistic Values at the Ihird Quartile (Q3) and 
by the KiMber of Iteas in the First Quadrant of Plot 



Reference-Focal Hantel-Haenszel T-Statistic XtaK in First 

Cxwp Coi5>arison Alpha at Q3 ftt Q3 Quadrant of Plot 

Value J^ank Value Kank IJuober Rank 



Ethnic Group 1 vs, l^U 1 ,80 1 5 2 

Ethnic Group 7 ^ 

Ethnic Groi^ 1 ts- 1.17 2 1-50 2 4 1 

Ethnic Group 4 

Ethnic Group 1 vs, 1*27 3,5 1-90 3 7 3 

Ethnic Group 3 

Ethnic Groi^ 1 vs. 1.27 3,5 4 16 A 

Ethnic Groiq) 6 

Ethnic Groi^ 1 vs. 1.33 5 IM 5 17 5 

Other 

Ethnic Gro^Jp 1 vs« 
Ethnic Group 2 

Ethnic 6roig> 1 vs. 1.65 7 3.39 7 52 

Ethnic Groi^ 5 



i.47 6 3.04 6 30 6 



1 Rank orders range frosa a low of 1 to a hi^» of 7. 
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Xable 8 

Ruaber of Significant (p< .01) Kantel-Kaensrel Q^-Square Values and 
sS^cant T-St^tistics CoBq«vred to Humber of Itaos in Plot Quadrant 1 
by Refarance-Focal Group Conpariaon 



Significant 
Mantel'Baensral Qii-Squares 



Significant 
7*Statistica 



Nunber 
in Quad. 1 



tafaranca-Focal 

Group Ca^>arison 



Total 
Hisber 



ffuDber Sho%dng 
BIF Against 
Minority Groiq) 



liUBber Shoving 
Total DIF Against 
Kuober Minority Grotrp 



Ethnic Groi^ 1 ts. 47,573 261 

Ethnic Group 2 6,486 

Ethnic Group 1 rt. 47,573 221 

Ethnic Group 3 5,466 

Ethnic Grotq> 1 vs* 47,573 157 

Ethnic Group 4 2,004 

Ethnic Group 1 vs. 47,573 223 

Ethnic Group 5 1,014 

Ethnic Groiqj 1 vs. 47,573 114 

Ethtiic Grotsp 6 486 

Ethnic Groiqj 1 vs. 47,573 22 

Ethnic Group 7 307 

Ethnic Group 1 vs. 47,573 140 

Other 726 



132 
118 
SO 
113 
63 
11 
75 



500 
1,000 

500 
1,000 

500 
1,000 

500 
1,014 

500 
486 

500 
307 

500 
726 



175 
106 

85 
189 
126 

16 
142 



83 
54 
38 
94 

56 
5 
67 



30 
7 
4 

52 

16 
5 

17 



Figure 1 



Table of Frequencies Used in the Hantel-Haenszel Analysis 
of the Performance of a Score Group on an Item 



Item Score 





0 


1 


Reference 






Group 




Focal 






Group 













'f3 



so 



i*i};ulo 2 

Plot oF t Statilstics by M.H. Alpha 
Ethnic Group 2 



PLOT OF T^M^M • -UQCNDt A - f 08S. B • 2 DOS, ETC. 

T I 
12.9 ♦ 
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a. 9 ♦ 



0-0 ♦ 



-2.9 ♦ 



-9.0 t 



•7.9 ♦ 



10.0 ♦ 



-!2.9 ♦ 



15.0 ♦ 



AAA 



AA 
A 



A A AA 
AA 
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A AS A CAA 8 
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AABB AAA 
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A 8 C '8 8 A 
AAACAC A 

AA8 
8BA AA 
BAce 
ABC ' 
8 B AAA 
AA BA 
BAA 
B 

88 
A A 



A 



— 
0.0 



AA 



A A 
B A A 
A AAA 
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A A 
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1.2 1.8 7.4 3.0 3.6 



4.2 4.8 
M H 



5.4 6.0 6.6 7.2 7.« 8.4 



— 
9.0 
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Plot of t Statlntlc by M.ll* Mphd 
Etlinlc Group 3 



f>ior Of 

T 

7 ♦ 



r^HJi LiQCNOi A • I 08S. B • 2 OBS. ETC. 



6 ♦ 

i 

5 ^ 



4 ♦ 



3 ♦ 

•I 
I 
I 

0 ♦ 

I 
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I 

-3 ♦ 

J 
I 

I 

I 

-7 ♦ 
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Plot of t Statistic by M.H* Alpha 
Ethnic Group h 
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riot of t statistics by MJl* Aiplm 
Ethnic Group 5 



PLOT or M . lEOCNDt A • I OOS. B • 2 OBS. ttC. 
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Plot of t Statistic by M.IK Mplin 
Ethnic Group 6 



PLOT or t^P^Ji • UOCNOt * f CBS. B • 2 OQS, ETC. 
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Plot OF T*M^M 
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riot oE t Statistics by H.H. Alpha 
Ethnic Group 7 
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Figure 8 

Plot of t Statistics by H.H. Alpha 
Others 
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